Discovery of Western European R1b1a2 Y Chromosome Variants in 1000 Genomes Project Data: An Online Community Approach
نویسندگان
چکیده
The authors have used an online community approach, and tools that were readily available via the Internet, to discover genealogically and therefore phylogenetically relevant Y-chromosome polymorphisms within core haplogroup R1b1a2-L11/S127 (rs9786076). Presented here is the analysis of 135 unrelated L11 derived samples from the 1000 Genomes Project. We were able to discover new variants and build a much more complex phylogenetic relationship for L11 sub-clades. Many of the variants were further validated using PCR amplification and Sanger sequencing. The identification of these new variants will help further the understanding of population history including patrilineal migrations in Western and Central Europe where R1b1a2 is the most frequent haplogroup. The fine-grained phylogenetic tree we present here will also help to refine historical genetic dating studies. Our findings demonstrate the power of citizen science for analysis of whole genome sequence data.
منابع مشابه
I-49: Human Y Chromosome ProteomeProject
The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...
متن کاملDiscovery of Phylogenetic Relevant Y-chromosome Variants in 1000 Genomes Project Data
Current Y chromosome research is limited in the poor resolution of Y chromosome phylogenetic tree. Entirely sequenced Y chromosomes in numerous human individuals have only recently become available by the advent of next-generation sequencing technology. The 1000 Genomes Project has sequenced Y chromosomes from more than 1000 males. Here, we analyzed 1000 Genomes Project Y chromosome data of 126...
متن کاملIntegrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel
A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes ac...
متن کاملGeneration of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data
An approach for generating high-resolution a priori maximum parsimony Y-chromosome (“chrY”) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (“next-generation”) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty...
متن کاملAlignment of 1000 Genomes Project reads to reference assembly GRCh38
The 1000 Genomes Project produced more than 100 trillion basepairs of short read sequence from more than 2600 samples in 26 populations over a period of five years. In its final phase, the project released over 85 million genotyped and phased variants on human reference genome assembly GRCh37. An updated reference assembly, GRCh38, was released in late 2013, but there was insufficient time for ...
متن کامل